Covid-19: a time series analysis and forecasting

In this notebook, we are going to examine data about the disease covid-19 caused by the novel coronavirus (nCoV) 2020 pandemic. The data is collected globally in every country (for some countries, e.g. China, also in each region) and updated every 24h.

The dataset is available here: https://www.kaggle.com/imdevskp/corona-virus-report

Our goal is to determine the mortality rate per country over time, and visualise the curve of confirmed cases, deaths, recovered and active cases.

Finally we will forecast our time series with Prophet to analyse the curve of new infections and deaths rate in the UK vs the US.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import plotly.express as px

# App fluorish
from IPython.display import Javascript
from IPython.core.display import display
from IPython.core.display import HTML
In [2]:
covid_19 = pd.read_csv('data/covid_19_clean_complete.csv', parse_dates=['Date'])
covid_19.head()
Out[2]:
Province/State Country/Region Lat Long Date Confirmed Deaths Recovered
0 NaN Afghanistan 33.0000 65.0000 2020-01-22 0 0 0
1 NaN Albania 41.1533 20.1683 2020-01-22 0 0 0
2 NaN Algeria 28.0339 1.6596 2020-01-22 0 0 0
3 NaN Andorra 42.5063 1.5218 2020-01-22 0 0 0
4 NaN Angola -11.2027 17.8739 2020-01-22 0 0 0
In [3]:
df1 = covid_19.groupby(["Date", "Country/Region"])['Confirmed'].sum().reset_index()
df_fluorish = df1.pivot(index='Country/Region', columns='Date').reset_index()
# df_fluorish.to_csv(r'data/covid_19_merged.csv', index = False)

Data pre processing

We will only focus on:

  • Confirmed cases
  • Deaths
  • Recovered
  • Active cases

We can ignore coordinates, drop missing values and group every country by its region if any are available.

In [4]:
# cases 
cases = ['Confirmed', 'Deaths', 'Recovered', 'Active']

# Active Case = confirmed - deaths - recovered
covid_19['Active'] = covid_19['Confirmed'] - covid_19['Deaths'] - covid_19['Recovered']

# replacing Mainland china with just China
covid_19['Country/Region'] = covid_19['Country/Region'].replace('Mainland China', 'China')

# filling missing values 
covid_19[['Province/State']] = covid_19[['Province/State']].fillna('')
covid_19[cases] = covid_19[cases].fillna(0)

# fixing datatypes
covid_19['Recovered'] = covid_19['Recovered'].astype(int)

Next, we get the latest date and sum all countries to find confirmed cases, deaths and mortality rate globally.

The overall reported mortality rate is 0.05%as of 31/3/2020.

In [5]:
# latest
latest = covid_19[covid_19['Date'] == max(covid_19['Date'])].reset_index()

# latest condensed
latest_grouped = latest.groupby('Country/Region')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
In [6]:
total = covid_19.groupby(['Country/Region', 'Province/State'])['Confirmed', 'Deaths', 'Recovered', 'Active'].max()
In [7]:
total = covid_19.groupby('Date')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
total = total[total['Date']==max(total['Date'])].reset_index(drop=True)
total['Global Moratality'] = total['Deaths']/total['Confirmed']
total['Deaths per 100 Confirmed Cases'] = total['Global Moratality']*100
total.style.background_gradient(cmap='inferno')
Out[7]:
Date Confirmed Deaths Recovered Active Global Moratality Deaths per 100 Confirmed Cases
0 2020-03-31 00:00:00 857487 42107 176442 638938 0.0491051 4.91051

Now we can group by countries and display in order of confirmed cases.

The US is the country with the highest number of reported infected, while Italy reports the most deaths. China instead which reports the highest number of recovered cases, being in the most advanced stage of the contagion.

In [8]:
by_confirmed = latest_grouped.sort_values(by='Confirmed', ascending=False)
by_confirmed = by_confirmed[['Country/Region', 'Confirmed', 'Active', 'Deaths', 'Recovered']]
by_confirmed = by_confirmed.reset_index(drop=True)

by_confirmed.style.background_gradient(cmap="Blues", subset=['Confirmed'])\
            .background_gradient(cmap="Oranges", subset=['Active'])\
            .background_gradient(cmap="Greens", subset=['Recovered'])\
            .background_gradient(cmap="Reds", subset=['Deaths'])
Out[8]:
Country/Region Confirmed Active Deaths Recovered
0 US 188172 177275 3873 7024
1 Italy 105792 77635 12428 15729
2 Spain 95923 68200 8464 19259
3 China 82279 2764 3309 76206
4 Germany 71808 54933 775 16100
5 France 52827 39782 3532 9513
6 Iran 44605 27051 2898 14656
7 United Kingdom 25481 23509 1793 179
8 Switzerland 16605 14349 433 1823
9 Turkey 13531 13074 214 243
10 Belgium 12775 10374 705 1696
11 Netherlands 12667 11374 1040 253
12 Austria 10180 8957 128 1095
13 South Korea 9786 4216 162 5408
14 Canada 8527 8426 101 0
15 Portugal 7443 7240 160 43
16 Brazil 5717 5389 201 127
17 Israel 5358 5114 20 224
18 Norway 4641 4589 39 13
19 Australia 4559 4183 18 358
20 Sweden 4435 4239 180 16
21 Czechia 3308 3232 31 45
22 Ireland 3235 3159 71 5
23 Denmark 3039 2872 90 77
24 Malaysia 2766 2186 43 537
25 Chile 2738 2570 12 156
26 Russia 2337 2199 17 121
27 Poland 2311 2271 33 7
28 Romania 2245 1943 82 220
29 Ecuador 2240 2111 75 54
30 Luxembourg 2178 2075 23 80
31 Philippines 2084 1947 88 49
32 Japan 1953 1473 56 424
33 Pakistan 1938 1836 26 76
34 Thailand 1651 1299 10 342
35 Saudi Arabia 1563 1388 10 165
36 Indonesia 1528 1311 136 81
37 Finland 1418 1391 17 10
38 India 1397 1239 35 123
39 South Africa 1353 1317 5 31
40 Greece 1314 1213 49 52
41 Panama 1181 1142 30 9
42 Iceland 1135 935 2 198
43 Dominican Republic 1109 1053 51 5
44 Mexico 1094 1031 28 35
45 Peru 1065 641 30 394
46 Argentina 1054 787 27 240
47 Singapore 926 683 3 240
48 Colombia 906 859 16 31
49 Serbia 900 884 16 0
50 Croatia 867 794 6 67
51 Slovenia 802 777 15 10
52 Qatar 781 717 2 62
53 Estonia 745 715 4 26
54 Algeria 716 626 44 46
55 Diamond Princess 712 99 10 603
56 Egypt 710 507 46 157
57 Iraq 694 474 50 170
58 United Arab Emirates 664 597 6 61
59 New Zealand 647 572 1 74
60 Ukraine 645 618 17 10
61 Morocco 617 557 36 24
62 Bahrain 567 268 4 295
63 Lithuania 537 522 8 7
64 Armenia 532 499 3 30
65 Hungary 492 439 16 37
66 Lebanon 470 421 12 37
67 Bosnia and Herzegovina 420 390 13 17
68 Bulgaria 399 374 8 17
69 Latvia 398 397 0 1
70 Tunisia 394 381 10 3
71 Andorra 376 354 12 10
72 Slovakia 363 360 0 3
73 Moldova 353 331 4 18
74 Costa Rica 347 341 2 4
75 Kazakhstan 343 317 2 24
76 Uruguay 338 296 1 41
77 North Macedonia 329 308 9 12
78 Taiwan* 322 278 5 39
79 Azerbaijan 298 267 5 26
80 Kuwait 289 216 0 73
81 Jordan 274 239 5 30
82 Cyprus 262 231 8 23
83 Burkina Faso 261 215 14 32
84 Albania 243 176 15 52
85 San Marino 236 197 26 13
86 Vietnam 212 154 0 58
87 Cameroon 193 182 6 5
88 Oman 192 157 1 34
89 Cuba 186 172 6 8
90 Cote d'Ivoire 179 171 1 7
91 Senegal 175 135 0 40
92 Afghanistan 174 165 4 5
93 Uzbekistan 172 163 2 7
94 Malta 169 167 0 2
95 Ghana 161 125 5 31
96 Belarus 152 104 1 47
97 Sri Lanka 143 124 2 17
98 Mauritius 143 138 5 0
99 Honduras 141 131 7 3
100 Nigeria 135 125 2 8
101 Venezuela 135 93 3 39
102 Brunei 129 83 1 45
103 West Bank and Gaza 119 100 1 18
104 Kosovo 112 105 1 6
105 Georgia 110 89 0 21
106 Montenegro 109 107 2 0
107 Cambodia 109 86 0 23
108 Kyrgyzstan 107 104 0 3
109 Bolivia 107 101 6 0
110 Congo (Kinshasa) 98 88 8 2
111 Trinidad and Tobago 87 83 3 1
112 Rwanda 75 75 0 0
113 Liechtenstein 68 68 0 0
114 Paraguay 65 61 3 1
115 Kenya 59 57 1 1
116 Madagascar 57 57 0 0
117 Monaco 52 49 1 2
118 Bangladesh 51 21 5 25
119 Uganda 44 44 0 0
120 Guatemala 38 25 1 12
121 Jamaica 36 33 1 2
122 Zambia 35 35 0 0
123 Togo 34 23 1 10
124 Barbados 34 34 0 0
125 El Salvador 32 31 1 0
126 Djibouti 30 30 0 0
127 Mali 28 26 2 0
128 Niger 27 24 3 0
129 Ethiopia 26 24 0 2
130 Guinea 22 22 0 0
131 Congo (Brazzaville) 19 19 0 0
132 Tanzania 19 17 1 1
133 Maldives 18 5 0 13
134 Gabon 16 15 1 0
135 Burma 15 14 1 0
136 Eritrea 15 15 0 0
137 Haiti 15 14 0 1
138 Bahamas 14 13 0 1
139 Saint Lucia 13 12 0 1
140 Dominica 12 12 0 0
141 Mongolia 12 10 0 2
142 Guyana 12 10 2 0
143 Equatorial Guinea 12 11 0 1
144 Namibia 11 9 0 2
145 Libya 10 9 0 1
146 Syria 10 8 2 0
147 Seychelles 10 10 0 0
148 Laos 9 9 0 0
149 Suriname 9 9 0 0
150 Eswatini 9 9 0 0
151 Benin 9 8 0 1
152 Grenada 9 9 0 0
153 Mozambique 8 8 0 0
154 Guinea-Bissau 8 8 0 0
155 Zimbabwe 8 7 1 0
156 Saint Kitts and Nevis 8 8 0 0
157 Chad 7 7 0 0
158 Angola 7 4 2 1
159 Antigua and Barbuda 7 7 0 0
160 Sudan 7 4 2 1
161 Cabo Verde 6 5 1 0
162 Mauritania 6 3 1 2
163 Holy See 6 6 0 0
164 Somalia 5 4 0 1
165 Nicaragua 5 4 1 0
166 Fiji 5 5 0 0
167 Nepal 5 4 0 1
168 Botswana 4 3 1 0
169 Gambia 4 3 1 0
170 Bhutan 4 4 0 0
171 Central African Republic 3 3 0 0
172 Belize 3 3 0 0
173 Liberia 3 3 0 0
174 Burundi 2 2 0 0
175 MS Zaandam 2 2 0 0
176 Saint Vincent and the Grenadines 1 0 0 1
177 Sierra Leone 1 1 0 0
178 Papua New Guinea 1 1 0 0
179 Timor-Leste 1 1 0 0

Let's do the same, this time only displaying overall deaths and calculating the mortality per country as:

Mortality rate = number of deaths / number of confirmed

The mortality of the virus varies greatly, with Italy at almost 12% and Germany at 1% being the 2 extremes in Europe.

In [9]:
by_deaths = by_confirmed[by_confirmed['Deaths']>0][['Country/Region', 'Deaths']]
by_deaths['Deaths / 100 Cases'] = round((by_confirmed['Deaths']/by_confirmed['Confirmed'])*100, 2)
by_deaths.sort_values('Deaths', ascending=False).reset_index(drop=True).style.background_gradient(cmap='Reds')
Out[9]:
Country/Region Deaths Deaths / 100 Cases
0 Italy 12428 11.75
1 Spain 8464 8.82
2 US 3873 2.06
3 France 3532 6.69
4 China 3309 4.02
5 Iran 2898 6.5
6 United Kingdom 1793 7.04
7 Netherlands 1040 8.21
8 Germany 775 1.08
9 Belgium 705 5.52
10 Switzerland 433 2.61
11 Turkey 214 1.58
12 Brazil 201 3.52
13 Sweden 180 4.06
14 South Korea 162 1.66
15 Portugal 160 2.15
16 Indonesia 136 8.9
17 Austria 128 1.26
18 Canada 101 1.18
19 Denmark 90 2.96
20 Philippines 88 4.22
21 Romania 82 3.65
22 Ecuador 75 3.35
23 Ireland 71 2.19
24 Japan 56 2.87
25 Dominican Republic 51 4.6
26 Iraq 50 7.2
27 Greece 49 3.73
28 Egypt 46 6.48
29 Algeria 44 6.15
30 Malaysia 43 1.55
31 Norway 39 0.84
32 Morocco 36 5.83
33 India 35 2.51
34 Poland 33 1.43
35 Czechia 31 0.94
36 Panama 30 2.54
37 Peru 30 2.82
38 Mexico 28 2.56
39 Argentina 27 2.56
40 San Marino 26 11.02
41 Pakistan 26 1.34
42 Luxembourg 23 1.06
43 Israel 20 0.37
44 Australia 18 0.39
45 Ukraine 17 2.64
46 Russia 17 0.73
47 Finland 17 1.2
48 Serbia 16 1.78
49 Hungary 16 3.25
50 Colombia 16 1.77
51 Slovenia 15 1.87
52 Albania 15 6.17
53 Burkina Faso 14 5.36
54 Bosnia and Herzegovina 13 3.1
55 Chile 12 0.44
56 Andorra 12 3.19
57 Lebanon 12 2.55
58 Diamond Princess 10 1.4
59 Tunisia 10 2.54
60 Thailand 10 0.61
61 Saudi Arabia 10 0.64
62 North Macedonia 9 2.74
63 Cyprus 8 3.05
64 Bulgaria 8 2.01
65 Congo (Kinshasa) 8 8.16
66 Lithuania 8 1.49
67 Honduras 7 4.96
68 Cuba 6 3.23
69 United Arab Emirates 6 0.9
70 Cameroon 6 3.11
71 Croatia 6 0.69
72 Bolivia 6 5.61
73 Ghana 5 3.11
74 Taiwan* 5 1.55
75 Jordan 5 1.82
76 Azerbaijan 5 1.68
77 South Africa 5 0.37
78 Bangladesh 5 9.8
79 Mauritius 5 3.5
80 Afghanistan 4 2.3
81 Estonia 4 0.54
82 Bahrain 4 0.71
83 Moldova 4 1.13
84 Trinidad and Tobago 3 3.45
85 Venezuela 3 2.22
86 Paraguay 3 4.62
87 Armenia 3 0.56
88 Niger 3 11.11
89 Singapore 3 0.32
90 Angola 2 28.57
91 Guyana 2 16.67
92 Syria 2 20
93 Costa Rica 2 0.58
94 Kazakhstan 2 0.58
95 Qatar 2 0.26
96 Mali 2 7.14
97 Uzbekistan 2 1.16
98 Iceland 2 0.18
99 Nigeria 2 1.48
100 Sri Lanka 2 1.4
101 Sudan 2 28.57
102 Montenegro 2 1.83
103 Zimbabwe 1 12.5
104 Nicaragua 1 20
105 Mauritania 1 16.67
106 Cabo Verde 1 16.67
107 Botswana 1 25
108 Burma 1 6.67
109 Gabon 1 6.25
110 Tanzania 1 5.26
111 Brunei 1 0.78
112 El Salvador 1 3.12
113 Togo 1 2.94
114 Jamaica 1 2.78
115 Guatemala 1 2.63
116 Monaco 1 1.92
117 Kenya 1 1.69
118 Kosovo 1 0.89
119 West Bank and Gaza 1 0.84
120 Belarus 1 0.66
121 Cote d'Ivoire 1 0.56
122 Oman 1 0.52
123 Uruguay 1 0.3
124 New Zealand 1 0.15
125 Gambia 1 25
In [10]:
# Deaths
temp = latest_grouped[latest_grouped['Deaths']>0]
fig = px.choropleth(temp, 
                    locations="Country/Region", locationmode='country names',
                    color=np.log(temp["Deaths"]), hover_name="Country/Region", 
                    color_continuous_scale="Peach", hover_data=['Deaths'],
                    title='Countries with Deaths Reported')
fig.update(layout_coloraxis_showscale=False)
fig.show()
In [11]:
HTML('''<div class="flourish-embed flourish-bar-chart-race" data-src="visualisation/1714161" data-url="https://public.flourish.studio/visualisation/1714161/embed"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')
Out[11]:

Forecasting with Prophet

Prophet is a forecasting procedure implemented in R and Python. At its core, the Prophet procedure is an additive regression model.

We will predict the curve of infections and deaths for the UK and for the USA. The two countries vary a lot interms of population size, isolation measures and social distancing, with the UK adopting an approach more similar to Italy and the US having a less restricted system.

exponential.jpeg

In [12]:
from fbprophet import Prophet

Let's examine how out features will evolve with time in the next 30 days. Starting with the number of confirmed cases globally.

In [13]:
data = ['Confirmed', 'Deaths', 'Recovered', 'Active']

df = covid_19.loc[covid_19['Country/Region'] == 'United Kingdom']
df_uk = df.groupby(df['Date']).sum()
df_uk = df_uk[data]

df_uk.reset_index(inplace=True)
df_uk['Date'] = pd.to_datetime(df_uk.Date)
df_uk.head()
Out[13]:
Date Confirmed Deaths Recovered Active
0 2020-01-22 0 0 0 0
1 2020-01-23 0 0 0 0
2 2020-01-24 0 0 0 0
3 2020-01-25 0 0 0 0
4 2020-01-26 0 0 0 0

By default, Prophet uses a linear model to forecast. When forecasting growth, there is usually some maximum achievable point: total market size, total population size, a virus spread etc. This is called the carrying capacity, and the forecast should saturate at this point.

The UK, has a population size comparable to Italy's. Similar precaution measure were taken to increase social distancing and enforce a lockdown. For this reasons and the fact that these measures were taken at similar stage of the contagion, we are going to assume (and hope) that the total number of cases will reach its peak between 100.000 and 150.000 cases

In [14]:
uk_confirmed = df_uk[['Date', 'Confirmed']].rename(columns={'Date': 'ds', 'Confirmed': 'y'})
uk_deaths = df_uk[['Date', 'Deaths']].rename(columns={'Date': 'ds', 'Deaths': 'y'})

uk_confirmed['cap'] = 150000
In [15]:
m = Prophet(growth='logistic')
m.fit(uk_confirmed)
future = m.make_future_dataframe(periods=30)
future['cap'] = 150000
future.tail()
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
/Users/matteofusilli/opt/anaconda3/lib/python3.7/site-packages/pystan/misc.py:399: FutureWarning:

Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

Out[15]:
ds cap
95 2020-04-26 150000
96 2020-04-27 150000
97 2020-04-28 150000
98 2020-04-29 150000
99 2020-04-30 150000

Finally, let's predict the curve of infected for the next 30 days

In [16]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[16]:
ds yhat yhat_lower yhat_upper
95 2020-04-26 144853.814490 144546.892164 145177.133303
96 2020-04-27 145629.083046 145314.788594 145907.147360
97 2020-04-28 146271.272024 145964.019703 146583.248655
98 2020-04-29 146984.077200 146677.625925 147305.075473
99 2020-04-30 147451.638456 147122.755319 147754.623178
In [17]:
fig1 = m.plot(forecast)

If the model is correct, the UK might reach its peak of infected population by the beginning of May and stabilise thereafter.

In [18]:
fig2 = m.plot_components(forecast)

During the week, there seems to be a peak in new recorder cases towards the end of the week. This might be due to the fact that hospitals usually take some time to investigate and record a new infected case, or a new death, so we should expect a similar line for recorded deaths.

In [19]:
uk_deaths['cap'] = 10500

At its current mortality rate of ~7%, we will cap UK's deaths at 10500. The curve indicates that the UK is set to have approximately 5000 deaths by next week. It's important to notice this growth rate could decrease if everybody respect the lockdown, limiting the number of new infections.

In [20]:
m = Prophet(growth='logistic')
m.fit(uk_deaths)


future = m.make_future_dataframe(periods=14)
future['cap'] = 10500
fcst = m.predict(future)
fig = m.plot(fcst)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
/Users/matteofusilli/opt/anaconda3/lib/python3.7/site-packages/pystan/misc.py:399: FutureWarning:

Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

Now let's examine the USA. This country has a much larger population size and rate of infections. The US are currently adopting a protocol of social distancing but are not adopting the same restrictions, or lockdown, as the UK or other European countries. Here it is estimated that up to 200.000 people may die as a result of the covid-19 and potentially millions will be infected.

For this reason we will not cap the maximum number of infected or deaths for now, as we won't predict that far in time.

In [21]:
data = ['Confirmed', 'Deaths', 'Recovered', 'Active']

df = covid_19.loc[covid_19['Country/Region'] == 'US']
df_us = df.groupby(df['Date']).sum()
df_us = df_us[data]

df_us.reset_index(inplace=True)
df_us['Date'] = pd.to_datetime(df_us.Date)
df_us.tail()
Out[21]:
Date Confirmed Deaths Recovered Active
65 2020-03-27 101657 1581 869 99207
66 2020-03-28 121478 2026 1072 118380
67 2020-03-29 140886 2467 2665 135754
68 2020-03-30 161807 2978 5644 153185
69 2020-03-31 188172 3873 7024 177275
In [22]:
us_confirmed = df_us[['Date', 'Confirmed']].rename(columns={'Date': 'ds', 'Confirmed': 'y'})
us_deaths = df_us[['Date', 'Deaths']].rename(columns={'Date': 'ds', 'Deaths': 'y'})

us_confirmed['cap'] = 1500000

us_confirmed.tail()
Out[22]:
ds y cap
65 2020-03-27 101657 1500000
66 2020-03-28 121478 1500000
67 2020-03-29 140886 1500000
68 2020-03-30 161807 1500000
69 2020-03-31 188172 1500000
In [23]:
m = Prophet(growth='logistic')
m.fit(us_confirmed)
future = m.make_future_dataframe(periods=10)
future['cap'] = 1500000
future.tail()
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
/Users/matteofusilli/opt/anaconda3/lib/python3.7/site-packages/pystan/misc.py:399: FutureWarning:

Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

Out[23]:
ds cap
75 2020-04-06 1500000
76 2020-04-07 1500000
77 2020-04-08 1500000
78 2020-04-09 1500000
79 2020-04-10 1500000
In [24]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[24]:
ds yhat yhat_lower yhat_upper
75 2020-04-06 512465.835955 509057.473399 515820.394249
76 2020-04-07 583102.744831 579563.241140 586725.273739
77 2020-04-08 658504.790872 655034.856975 662138.101318
78 2020-04-09 735590.249427 732043.756658 739066.119099
79 2020-04-10 812544.755942 809189.532259 816253.809720
In [25]:
fig1 = m.plot(forecast)

Regarding the number of infections, US is set to potentially reach half million infected citizens by next week already, unless social distancing produces its effects earlier, which however, most consider to be unlikely.

In a recent interview Trump claimed to be willing to keep the number of deaths below 200.000, but we will use the much lower value of 20.000 as a cap to compare the death rate directly with the UK, and considering the US has currently a mortality percentage of 2%.

In [26]:
us_deaths['cap'] = 20000
m = Prophet(growth='logistic')
m.fit(us_deaths)


future = m.make_future_dataframe(periods=14)
future['cap'] = 20000
fcst = m.predict(future)
fig = m.plot(fcst)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
/Users/matteofusilli/opt/anaconda3/lib/python3.7/site-packages/pystan/misc.py:399: FutureWarning:

Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

As we can see the raise is much sharper, with 5000 fatalities by the second of April and 7500 only few days after.

Limitations

There are several limitation in the following model:

  • The lockdowns and social distancing recommendations adopted by many countries around the world will certainly have their positive effect, by slowing down the rate of new infections and deaths reported. This usually shows in the curve with a 2 to 3 weeks delay from the moment the measures were adopted, so predictions may be worse than actual values depending on how strict these measures are.

  • Different countries report new infections and deaths differently. The way tests are carried can vary a lot and hospital capacity is certainly a major factor which can gratly contribute to mortality rate and should be accounted for.

Conclusion

We examined a dataset reporting the total confirmed covid-19 cases and deaths globally and per each country. We were able to determine the overall mortality rate and notice great differences among countries and continents.

The results seem to highlight the fact that social distancing and countries lockdown have indeed a huge impact on the total number of infected and dead if these measures are taken in time.

Finally, after highlighting all the limitation that a logistic model can have on our final prediction, we showed two countries, the US and the UK. Despite their very different population sizes, density, government guidelines and social distancing measures, we were able to plot curves of infected and deaths and forecast the future for these values for the next 14 to 30 days.

In [ ]: